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Summary 

O 

O This is an expose on the use of O'SuUivan penalised splines in contemporary semipara- 

^ metric regression, including mixed model and Bayesian formulations. O'SuUivan pe- 

nalised splines are similar to P-splines, but have an advantage of being a direct gener- 
alisation of smoothing splines. Exact expressions for the O'SuUivan penalty matrix are 
obtained. Comparisons between the two reveals that O'SuUivan penalised splines more 

I I closely mimic the natural boundary behaviour of smoothing splines. Implementation in 

modern computing environments such as Mat lab, R and BUGS is discussed. 

^ Keywords: Additive models, Markov chain Monte Carlo; Mbced models; P-splines; Smooth- 

ing splines. 

1 Introduction 

> 

^ Splines continue to play a central role in nonparametric and semiparametric regression 

modelling. Recent synopses include Eubank (1999), Gu (2002), Ruppert, Wand & Carroll 

^ (2003) and Denison, Holmes, Mallick & Smith (2002). In all but the last reference, smooth 

functional relationships are fitted using a large basis of spline functions subject to penali- 
sation. Up imtil the mid-1990s most literature on spline-based nonparametric regression 
was concerned with smoothing splines, and their multivariate extension thin plate splines, 

^P, where the penalty takes a particular form and the number of basis functions roughly 

equals the sample size (e.g. Wahba, 1990; Green & Silverman, 1994). However, in recent 

^ years, there has been a great deal of research on more general spline/penalty strategies, 

^ most of which use considerably fewer basis functions. Driving forces include: 

• more complicated models, often with several smooth functions; 

larger data sets, where smoothing and thin plate splines become computationally 
intractable, 

mixed model and Bayesian representations of smoothers that lend themselves to 
the use of established software, such as BUGS, Ime () in R and PROG mixed in 
SAS; provided the nimiber of basis functions is relatively low. 

Ruppert, Wand & Carroll (2003) summarise and provide access to many of these devel- 
opments. The term penalised splines has emerged as a descriptor for general spline fitting 
subject to penalties. 

O'SuUivan (1986, Section 3) introduced a class of penalised splines based on B-spline 
basis functions. O'SuUivan penalised splines are a direct generalisation of smoothing 
splines in that the latter arises when the maximal number of B-spline basis functions 
are included. Like smoothing splines, O'SuUivan penalised splines possess the attractive 
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feature of natural boundary conditions (e.g. Green & Silverman, 1994, p. 12). They have 
also become the most widely used class of penalised splines in statistical analyses due 
to their implementation in the popular R and S-PLUS function smooth . spline ( ) and 
associated generalised additive model software (e.g. the gam library in R; Hastie, 2006). 

Despite the omnipresence of O'Sullivan penalised splines, their use in semiparamet- 
ric regression contexts, particularly those involving mixed model and Bayesian represen- 
tations, is not very common. Recently, Welham, CuUis, Kenward & Thompson (2007) 
showed how most of the commonly used penalised splines can be treated within a single 
mixed model framework, although they did not work explicitly with the form given in 
O'Sullivan (1986). 

Our contributions in this paper are (a) providing an exact matrix expression for the 
penalty of O'Sullivan splines that allows implementation in a few lines of a matrix- 
based computing language, (b) comparison with their closest penalised spline relative, 
P-splines (Filers & Marx, 1996), which reveal some noticeable differences near the bound- 
aries, (c) demonstrate explicitly, including with R code, how O'Sullivan splines can be 
simply added to the mixed model-based regression armoury, and (d) investigate their ef- 
ficacy in Bayesian semiparametric regression using MCMC software such as BUGS and its 
variants. We conclude that the several attractive features of O'Sullivan penalised splines 
— smoothness, numerical stability, natural boundary properties, direct generalisation of 
smoothing splines — makes them a very good choice of basis in semiparametric regres- 
sion. 

Section |2] provides a brief description of O'Sullivan penalised splines. Comparison 
with P-splines is made in Section |3] Section |4] describes mixed model representation 
of O'Sullivan penalised splines and how they can be used in models that benefit from 
this representation. Issues concerning Bayesian penalised spline smoothing and Markov 
chain Monte Carlo are described in Section |5] O'Sullivan splines of general degree are 
described in Section |6] Closing discussion is given in Section |7] An appendix contains 
relevant R code. 

2 O'Sullivan Penalised Splines 

O'Sullivan penalised splines have already been described several times in the literature. 
A recent reference is the Chapter 5 Appendix of Hastie, Tibshirani & Friedman (2001). A 
brief sketch is given here for convenience. 

Consider the simplest nonparametric regression setting 



where (xj, G M x R. Suppose that an estimate of / is required over [a, b], an interval 
containing the x/s. For an integer < n let ki, . . . , kk+8 be a knot sequence such that 

a = Ki = K2 = K3 = K4 < < ■ ■ ■ < HR+i < f^K+b = I^K+Q = Kk+7 = Kk+8 = b 

and let . . . , Bk+4 be the cubic B-spline basis functions defined by these knots (see e.g. 
pp. 160-161 of Hastie et ah, 2001). Set up the nx {K + A) design matrix B with (i, k) entry 
Bik = Bk{xi) and the {K + A) x {K + 4) penalty matrix with (k, k') entry 



yi = f{xi)+ei, l<i<n, 



(1) 




Then an estimate of / at location x G R can be obtained as 



fo{x; A) = B^9o 



where P, 



o = 



(B^B + Afi)-iB^y, 



(2) 



Bx = [Bi{x), . . . , Bk+a{x)] and A > is a smoothing parameter. 
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Note that the cubic smoothing spline arises in the special case K = n and Kk+4 = x^, 
1 < k < n, provided the x/s are distinct (e.g. Green & Silverman, 1994, Section 3.6). Apart 
from giving a smooth (twice continuously differentiable) scatterplot smooth, /o(s A) has 
good numerical properties. The basis functions are bounded and so not prone to overflow 
problems. Moreover, B^B is 4-banded, which leads to 0(n) algorithms when K is close 
to n (e.g. Hastie, et al, 2001). In addition, fo{-', A) satisfies so-called natural boundary 
conditions, meaning that 

?o'(«; A) = ?o"(a; A) = A) = A) = 

and implying that /o(-;A) is linear over [0,^5] and [KK+4,b]. Figure [l] illustrates these 
natural boundary properties of /o(-; A) for data on ratios of strontium isotopes found in 
fossil shells and their age; see Chaudhuri & Marron (1999) for details. Also, /o(-; A) ap- 
proximates the least squares line as A ^ 00. The implication for mixed model smoothing 
is that the induced fixed effects component corresponds to straight line basis functions. 
Details are given in Section |4] 




Figure 1: Illustration of natural boundary properties of a 20-interior knot O'Sullivan penalised 
spline fit to the fossil data over the interval [85, 130] millions of years. The interior knots are 
shown as solid diamonds (♦). Inset: The 24 B-spline basis fiinctions. 

Computation of the design matrix B is usually quite easy. For example, B-splines are 
readily available in the Matlab, R and S-PLUS computing environments. Otherwise re- 
currence formulae (e.g. de Boor, 1978; Filers & Marx, 1996) can be called upon. However, 
computation of J2 requires some additional effort. In Section |6j while treating general de- 
gree O'Sullivan penalised splines, we derive an exact matrix algebraic expression for the 
corresponding penalty matrices. In the cubic case our theorem reduces to the expression: 

n = (B")^diag(w)B" (3) 

where B" is the 3{K + 7) x {K + 4) matrix with (i, j) entry B'-{xi), Xi is the ith entry of 
the vector 

/ Kl + K2 K2 + K3 Kk+7 + K-K+S \ 
X = \Ki, , K2, K2, , K3, . . . , KK+7, ^ , l^-K+S I • 
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and w is the 3{K + 7) x 1 vector given by 



w = (|(Ak)i, |(A/t)i, |(Ak)i, |(A/t)2, |(Ak)2, 1{Ak)2, 1{Ak)k+7, I{Ak)k+7, |(Ak)^+7) , 

where (Ak)^ = Kk+i — k^, 1 < A: < + 7. Result (jsj) is none other than Simpson's rule 
applied over each of the inter-knot differences. This is because each B'-'B'J function is 
piecewise quadratic. For commonly used values of K, ^ allows straightforward compu- 
tation of in matrix-based languages such as Matlab, R and S-PLUS. In the Appendix 
we demonstrate computation of in 4 lines of R code. 

Lastly, we mention knot choice. The R and S-PLUS function smooth . spline ( ) uses 

Kk — ( -^7——] th sample quantile of the x/s 

\K + 2 J 

where 

n n < 50 

100 n = 200 

140 n = 800 

200 + (n- 3200)^/^ n > 3200. 

Other values of n between 50 and 3200 are handled via a logarithmic interpolation. For 
many functional relationships, fewer knots are sufficient. Figure [T]is one example, where 
only K = 20 interior knots are used without compromising the quality of the fit. A 
common default in the penalised spline literature is K = mm{nu/4,35), where nu is 
the number of unique x/s (e.g. Ruppert et al, 2003). Ruppert (2002) discusses 'hi-tech' 
choice of K. The distribution of the knots, for a given K, may have some affect on the 
results. As mentioned above, smooth . spline ( ) uses quantile-based knots while e.g. 
Filers & Marx (1996) recommend equally-spaced knots. In most situations this effect will 
be minor. However, for either strategy, it is possible to construct regression functions and 
predictor variable distributions for which problems arise. More sophisticated knot place- 
ment strategies may help. For example, Luo & Wahba (1997) propose more sophisticated 
basis function reduction methods that could be adapted to the current context. 
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3 Comparison with P-splines 

The closest relatives of O'Sullivan penalised splines are the P-splines of Filers & Marx 
(1996). If the interior knots k^,. . . , kk+4 are taken to be equally-spaced then the family of 
cubic P-splines is given by ^ with the O replaced by D^^Dfc, where is the /cth-order 
differencing matrix. This differencing penalty corresponds to a discrete approximation 
to the integrated square of the kth derivative of the B-spline smoother. The choice k = 2 
leads to the cubic P-spline estimate 

fp{x; A) = B^dp, where 9p = (B^B + AD^D2)~^B^y, (4) 

having the property that /p(-; A) approaches the least squares line as A ^ oo. In this 
sense, (jij is the closest relative of /o(-; '^)- If the interior knots are equally-spaced then 
the bands in the interior rows are, up to multiplicative factors, as follows: 



O'Sullivan penalised splines (|2): 5, 0, -45, 80, -45, 0, 5 

Cubic P-splines; 2nd order diff. -4, 24, -60, 80, -60, 24, -4 

Figure |2] facilitates visual comparison of the two. It is seen that the differences are rela- 
tively small, although not negligible. 

What are the relative advantages of smoothers based on cubic P-splines and O'Sullivan 
penalised splines, or O-splines for short? Theoretical comparison between P-splines and 
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Figure 2: Comparison of near-diagonal entries of the penalty matrices for O' Sullivan penalised 
splines and cubic P-splines with k = 2 and equally-spaced interior knots. 

O-splines in terms of estimation performance, perhaps in the spirit of Hall & Opsomer 
(2005), would be ideal - although beyond the scope of the current paper. 

Filers and Marx (1996) partially justify use of P-splines rather than O'Sullivan splines 
based on simplicity of the P-spline penalty matrix. However, as seen from (|3]l, the penalty 
matrix needed for O-splines can be obtained straightforwardly. Furthermore the discrete 
approximation of P-splines requires equally-spaced knots which, depending on /, may 
not be desirable, 

A possible advantage of P-splines is the option of higher-order penalties, although 
the resulting smoothers can have erratic extrapolation behaviour. A possible advantage 
of O-splines is their direct relationship with time-honoured smoothing splines, and their 
attractive theoretical properties (e.g. Nussbaum, 1985). ^From the results described in 
Section|2]is clear that O-splines approach smoothing splines as K n. But how close are 
O-splines to smoothing splines for common (smaller) choices of K, and are they closer 
than P-splines with the same value of K and interior knots? To address these questions 
we conducted an empirical study based on the eighteen homoscedastic nonparametric 
regression settings in Wand (2000). For O-splines we used K = 100 equally spaced inte- 
rior knots with 4 repeated knots at each boundary as described in Section|2j However, for 
P-splines we used the knot sequence described in the Appendix of Filers & Marx (1996) 
which involves extending the knots beyond the boundary rather than repeating them. 
For each setting 200 samples were generated and smoothing spline estimates fs, with 
smoothing parameter chosen via generalised cross-validation, were obtained. We then 
computed fo and fp to have the same effective degrees of freedom as fs and recorded 
closeness measures d{fo, fs'-, ^) arid d{fp, fs; A) where 



We took A corresponding to the intervals (a,K5) (left boundary), {k^,kk+5) (interior), 
{kk+5, b) (right boundary) and (a, b) (total region) where the Kk denote the knots used 
for the O-spline fits. The Wand (2000) settings all involve predictor data within the unit 
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interval. We took (a, b) = (—0.1, 1.1) to assess behaviour beyond the range of the data. 
Wilcoxon tests on the 200 differences d{fo, fs'i^) — d{fp, fs] A) were carried out for each 
setting and choice of A. Apart from being distribution-free, Wilcoxon tests have the ad- 
vantage of being invariant to normalisation and whether differences or ratios are used. 
In all 72 cases O-splines were closer to smoothing splines than P-splines in the sense that 
the Wilcoxon p-value< 0.01. 

To appreciate the practical significance of these results we plotted the data and esti- 
mates at the 90th percentiles of each of the (i(/o, fs; A) and (i(/p, fs] A) samples, corre- 
sponding to relatively high discrepancies. Some examples are shown in Figure |3] 
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Figure 3: O-spline and P-spline fits compared with smoothing spline fits corresponding to the 
90th percentiles of the d{fo, fs', A) and d{fp, fs] A) samples; for two of the homoscedastic settings 
of Wand (2000). 

In the interior all estimates of the same data, and same degrees of freedom, are almost 
indistinguishable with the naked eye. However, big differences occur at the boundary. 
P-splines have a tendency to deviate from the natural boundary behaviour of smoothing 
splines. We also observed this phenomenom in the other 16 settings. Further study into 
this differing extrapolation behaviour would be worthwhile. We speculate that it comes 
from differences between the exact integral penalty and its discrete approximation near 
the boundary. 

4 Mixed Model Formulation 

There are several ways by which Uq in ^ can be expressed as a best linear unbiased 
predictor (BLUP) in a mixed model (e.g. Speed, 1990; Verbyla, 1994). However, from a 
software standpoint, the most convenient form is Vq = {P, u) where j3 and u are (empir- 
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ical) BLUPs of (3 and u in the mixed model 



y = X^ + Zu + e, 



u 

e 



N 








a^I 



(5) 



for some design matrices X and Z. An explicit expression for the BLUP in (|5| (e.g. Rup- 
pert. Wand & Carroll, 2003; Section 4.5.3) is 



u 



C^C + A 










I 



where C = [X|Z], I is the identity matrix with the same number of columns as Z. This 
'canonical form' can be achieved if a {K + 4) x [K + 4) linear transformation matrix L 
can be found such that C = BL and 









I 



The usual method for obtaining L is spectral decomposition (e.g. Nychka & Cummins, 
1996; Cantoni & Hastie, 2002; Welham et al, 2007). It follows from results in the smooth- 
ing spline literature (e.g. Speed, 1991, Section 6) that 

rank(ii) = K + 2. 

Hence, the spectral decomposition of is of the form = Udiag(d)U^ where U^U = I 
and d is a {K + 4) x 1 vector with exactly 2 zero entries and K + 2 positive entries. Let 
be the {K + 2) x 1 sub-vector of d containing these positive entries, and let Uz be the 
{K + 4) X [K + 2) sub-matrix of U with columns corresponding to positive entries of d. 



Then an appropriate linear transformation is L 
fixed and random effects design matrices: 



[U;,|Uzdiag(d; 



-1/2. 



X = BUx and Z = BU2diag(d 



-1/2. 
z ) 



This leads to the 



(6) 



However, following again from the aforementioned smoothing spline literature (e.g. Speed, 
1991, Section 6), BUx is a basis for the space of straight lines so the simpler specification 
X = [1 Xi]i<j<„ may be used instead without affecting the fit. The spline basis formed 
through spectral decomposition of more familiar bases is known as the Demmler-Reinsch 
basis (e.g. Nychka & Cummins, 1996). Figure |4] allows comparison of the original B- 
spline basis, corresponding to B, and the Demmler-Reinsch basis corresponding to Z. 
Notice the damping of the Z basis functions with increasing oscillation. This compen- 
sates for the fact that the penalty is a multiple of the identity matrix. 

In the Appendix it is shown how the R linear mixed model function Ime ( ) can be 
used to obtain /o(-; A) based on with Z given by For simple scatterplot smoothing 
there is little difference between this approach and direct use of smooth, spline ( ) , 
and the answers are equivalent if the knot sequence and A values are equal. The default 
choice of A differs: Ime ( ) uses restricted maximum likelihood (REML) to choose A, while 
smooth . spline ( ) uses generalised cross-validation (GCV). 

The main advantage of the mixed model formulation of penalised splines is the incor- 
poration into more complex models. Several examples are given in, for example, Rup- 
pert. Wand & Carroll (2003). We will briefly describe one of them here. Figure |5] displays 
a longitudinal data set on bone mineral acquisition in young females (source: Bachrach, 
Hastie, Wang, Narasimhan & Marcus, 1999). The data consists of spinal bone mineral 
density (SBMD) measurements on each of 230 female subjects aged between 8 and 27. 
Each subject is measured between one and four times. Let rij denote the number of mea- 
surements for subject i. The subjects have been divided into four ethnic groups: Asian, 
Black, Hispanic and White. 
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Figure 4: Comparison ofB-spline basis and Demmler-Reinsch basis for the fossil data example of 
Figure^ The interior knots are shown as solid diamonds (♦). 

A useful additive mixed model for these data is: 

SBMDjj = C/i+/(age.^-)+/3iBlacki+/32Hispanici+/?3Whitei+eij, 1 < i < 230, 1 < j < 

(7) 

where Ui i.i.d. A^(0, cr^) are random intercepts for each subject, and Blackj, HispaniCj and 
Whitej are ethnicity indicators, eij i.i.d. A^(0, a^) are random errors. More sophisticated 
models that account for, say, serial correlation could be entertained. 

O'Sullivan penalised splines can be used to fit (|7]| with the design matrices set up as 
follows. Based on the age^ values and appropriate knots, set up 

Z,pune = BU2diag(d2"^''^) 

analogous to the Z matrix of ^ for simple scatterplot smoothing. In the Appendix, when 
fitting data of this type, we use 15 interior knots corresponding to quantiles of the unique 
age values. Form 



X 



1 3-geii Blacki HispaniC]^ Whitei 



1 age^^^^ Blacki Hispanic^ Whitei 



1 age23oi Black23o Hispanic23o White230 



1 age23o 

,"230 ^-'-^'-■■'^230 Hispaiiic23o White23o 
Concatenate Z^ub, and Zspi^e to form 

Z = [Zjubj I Zspline] • 

The appropriate mixed model is then 

y = X/3 + Zu + £, Cov 



and Z 
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(8) 



8 



10 



15 20 25 



■D 
TO 
05 



^ 1.4 
o 

J3 



1.2 



1.0 



0.8 



0.6 





Hisp 


anic 

< 






wi- 


lite 






• 














1 — • • 








•• 








• 






















As 


an 






Bl£ 


jck 








> •- 




> 

• -• 


























• 



















"T 
10 



"T 
15 



"T 
20 



"T 
25 



1.4 



1.2 



1.0 



0.6 



age (years) 



Figure 5: The spinal hone mineral data. Lines connect measurements taken on the same subject. 

The Appendix contains R code for fitting this model. Note, in particular, that it cir- 
cumvents explicit specification of Zsutj- This is important for large longitudinal datasets. 



5 Bayesian Analysis and Markov Chain Monte Carlo 

A particularly attractive advantage of penalised splines, compared with smoothing splines, 
is the ease with which they can be fed into Markov Chain Monte Carlo (MCMC) schemes 
for fitting Bayesian semiparametric regression models - due to the reduction in the num- 
ber of basis functions. For simple scatterplot smoothing this involves the Bayesian ver- 
sion of (|5]|: 

y|/3,u,a2 ~ iV(X^ + Zu,a,2l), uja^ ~ N{^,al\) 

and suitable (usually diffuse) prior distributions for j9, o\ and cj^. However, the big ad- 
vantages of a Bayesian/MCMC approach are realised when handling complications such 
as measurement error (e.g. Carroll, Ruppert, Stefanski & Crainiceanu, 2006) and gener- 
alised responses (e.g. Zhao, Staudenmayer, CouU & Wand, 2006), which are hindered by 
intractable integrals in the likelihood. 

Crainiceanu, Ruppert & Wand (2005) focus on use of the MCMC package WinBUGS 
(windows version of BUGS, Spiegelhalter, Thomas & Best, 2000) for Bayesian penalised 
spline models. They reported that the choice of basis functions can have a substantial 
impact on the convergence of the chain. We decided to conduct some convergence checks 
for MCMC fitting of the regression model: 



logit{P(unionj = l|wagej)} = /(wagej) 



(9) 



with / estimated via O'Sullivan penalised splines. Here (wagej,unionj), \ <i < 534, are 
pairs of wage amounts (dollars per hour) and trade union membership indicators for a 
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sample of U.S. workers (source: Berndt, 1991). We expressed Q as the Bayesian logistic 
mixed model: 

logit{P(unioni = l|wagei)} = (X/3 + Zu)i, 1 < i < 534 

where X = [1 wageji<i<534 and Z = BUzdiag(d;j^^^), using the notation of Section [i] 
We used 15 interior knots with quantile spacing. 
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Figure 6: Fit of^ using O'Sullivan penalised splines. 
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1st quartile 



posterior mean; 0.085 
95% credible interval; 
(0.0513,0.126) 



2nd quartile 



3rd quartile 



posterior mean; 0.175 
95% credible interval: 
(0.129,0.234) 

posterior mean; 0.294 
95% credible interval: 
(0.224,0.378) 



Figure 7: Assessment ofMCMC convergence for O'Sullivan penalised spline estimation o/Q at 
each quartile of wage. The columns are: quartile of wage, trace plot of sample of corresponding co- 
efficient, plot of sample against 1-lagged sample, sample autocorrelation function, Gelman-Rubin 
diagnostic, kernel estimates posterior density and basic numerical summaries. 

Following the advice of Zhao et al. (2006) we used WinBUGS to generate chains of 
length 5000 after a burn-in of 5000 and applied a thinning factor of 5, resulting in posterior 
samples of size 1000. Also in keeping with the recommendations of Zhao et al. (2006) 
we placed diffuse priors on the fixed effect parameters and variance component: f3o, [3i 
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independent 7V(0, 10^) and the prior density of cj^ proportional to (cj2)^^ °^e^^/(^°°'^"), the 
Inverse Gamma distribution with shape and rate parameter both 0.01, after scaling the 
predictor to have unit variance. Zhao et al. (2006) found that the results can be sensitive 
to the choice of the Inverse Gamma hyperparameter. 

The pointwise posterior mean effect of wage on the probability of trade union mem- 
bership, together with 95% pointwise credible sets, is shown in Figure |6] Figure [7| allows 
assessment of convergence of the MCMC at each quartile of the wage sample and is seen 
to be excellent in each case. We also conducted convergence checks for larger logistic ad- 
ditive models involving up to 6 predictors and 3 smooth functions and found the mixing 
to be very good when O-splines were used. 

Several examples of semiparametric regression with WinBUGS, including code, are 
given in Crainiceanu et fl/.(2005) and Zhao et a/. (2006). 

6 General Degree Extension 

Cubic O'Sullivan penalised splines have a natural extension to general odd degree splines. 
Higher degree splines have a role to play when smoother curve estimates are required. 
This arises, for example, in feature significance methodology (e.g. Chaudhuri & Marron, 
1999; Hannig & Marron, 2006) where first and second derivatives of the fit are required. 

Return to the simple nonparametric regression setting ([T]) and let m be a general pos- 
itive integer. Form the knot sequence 

a = Ki = . . . = K2m < l^2m+l < ■ ■ ■ < K2m+K < l^2m+K+l = ■ ■ ■ = K4m+K = b 

and let i?2m-i,i, • • • , B2m-i,K+2m be the degree (2m — 1) B-spline basis defined by these 
knots. Order m O'Sullivan penalised splines then take the general form 

fo{x; m, A) = B2m-i,xi>o where Uq = (B^„_iB2,n-i + Aii('"))"^B^„_;^y. 

Here B2m-i is the n x {K + 4m) design matrix with (i, k) entry B2m-i,k{xi), 'B2m-i,x = 
[B2m-i,i{x) , . . . , B2m-i,K+2mix)] and ii^™") is the {K + 2m) x {K + 2m) penalty matrix 
with {k,k') entry 



In the special case where the interior knots coincide with the x/s, assumed distinct, 
/o(s "7, A) corresponds to the order m smoothing spline; i.e. the minimiser of 



(e.g. Schoenberg, 1964). 

We are now ready to state our result for exact computation of O'Sullivan spline penalty 
matrices: 





■b 



Theorem. The penalty matrix admits the exact explicit expression 

^{m) ^ (B(™))^diag(w)B(™) 



where B^™) is the (2m — 1){K + 4m — 1) x {K + 2m) matrix with (i, j) entry B^_^ j{xi) 
and w is a (2m — 1){K + 4m — 1) x 1 vector with ith entry Wi. The Xi and Wi values are 
obtained according to 



X(2m-l){i-l)+e+l — !<■£ + ^'hm/, ^^(2m-l)(£-l)+r+l — ^m/^m/' 
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for 1 < i < K + Am - 1, < £' < 2m - 2. Here, fori <k <K + 2m, hi^u = «^fc+i - Kfc 
and, for m>2, hm,k = (/«fc+i — ^k) / (2?n, — 2). Lastly, for all m> 1, 

t(t - 1) ■ ■ ■ (t - 2m + 2) , 



Proof. The (fc, k') entry of fi^™) is 



(m) 
kk' 



f 

J a 



B. 



(m) 

2m-l,fe 



(m) 

2m-l,k 



,{x) dx 



K+4m- 

E 

i=l 



B. 



(m) 
2m- 



■l,fcV-^J-D2m-l,A;' 



(10) 



Since -62™-! fc(^)' ^2m-i fc'(^) degree m—1 polynomials on each interval x e {ki, Kj+i) 

for 1 <i < K+Am—1 the function -82^-1 k(^)^2m-i fc'(^) ^ degree 2(m— 1) pol5momial 
on the same interval. The result follows by applying the Newton-Cotes integration (2m — 
l)-point rule (e.g. Whittaker & Robinson, 1967) to the right hand side of (10 1 which is 
exact for polynomials of degree 2(m — 1) or lower. 

□ 



Table [T] provides values of uJm,k for O'Sullivan polynomials up to degree 7. This, 
together with the Theorem, allows direct computation of penalty matrices of O'Sullivan 
splines for m < 4. Higher values of m require one-off calculation of the ujk,m through, 
say, a symbolic computation package such as Maple. 



m\k 





1 


2 


3 


4 


5 6 


1 


1 












2 


1/3 


4/3 


1/3 








3 


14/45 


64/45 


8/15 


64/45 


14/45 




4 


41/140 


54/35 


27/140 


68/35 


27/140 


54/35 41/140 



Table 1: Table ofLOm,k values for m < 4. 

Recall from Section |2] that, in the case of cubic O'Sullivan splines, Newton-Cotes in- 
tegration reduces to Simpson's rule and a simpler, more revealing, expression results in 
the shape of (|3]|. 

7 Closing Remarks 

Smoothing splines have a special place in nonparametric and semiparametric regression. 
They are based on simple and intuitive principles, have an attractive theory (e.g. Nuss- 
baum, 1985; Wahba, 1990; Eubank, 1994 and Solo, 2000) and possess good practical prop- 
erties such as natural boundary behaviour. Penalised splines, including P-splines, have 
gained popularity for reasons stated in the Introduction. However, proponents of pe- 
nalised splines have been viewed by some, especially in the smoothing spline commu- 
nity, as ignoring the benefits of that have been established for smoothing splines over the 
past few decades. O'Sullivan penalised splines, being a direct generalisation and closer 
approximation of smoothing splines, provide an attractive link between the two streams 
of semiparametric regression research and allow analysts to enjoy the best of both worlds. 
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Appendix: R implementation 

In this Appendix we provide R code for use of O'SuUivan penalised splines in the sim- 
plest semiparametric regression setting: scatterplot smoothing. The extensions to more 
complex models, such as those described by Ngo & Wand (2004) and Crainiceanu, Rup- 
pert & Wand (2005), is straightforward. We illustrate one of these extensions: additive 
mixed models. 

Direct scatterplot smoothing with user choice of smoothing parameter 

Obtain scatterplot data corresponding to environmental data from the R package latt ice. 
Set up plotting grid, knots and smoothing parameter: 

library (lattice) ; attach (environmental) 

X <- radiation ; y <- ozone" (1/3) 

a <- ; b <- 350 ; xg <- seq (a, b, length=101 ) 

numlntKnots <- 20 ; lambda <- 1000 

Set up the design matrix and related quantities: 

library (splines) 

intKnots <- quantile (unique (x) , seq ( , 1 , length= 

(numIntKnots+2) ) [-c(l, (numIntKnots+2 ) ) ] ) 
names (intKnots) <- NULL 
B <- bs (x, k;nots=intKnots, degree=3. 

Boundary . knots=c (a, b) , intercept=TRUE) 
BTB <- crossprod (B, B) ; BTy <- crossprod (B, y ) 

Create the matrix: 

formOmega <- function (a, b, intKnots) 
{ 

allKnots <- c (rep (a, 4 ) , intKnots, rep (b, 4 ) ) 

K <- length (intKnots) ; L <- 3*(K+8) 

xtilde <- (rep (allKnots, each=3) [-C (1, (L-1) , L) ] + 

rep (allKnots, each=3) [-c(l,2,L)])/2 
wts <- rep (diff (allKnots) , each=3) *rep (c (1, 4, 1) /6, K+7) 
Bdd <- spline . des (allKnots, xtilde, derivs=rep (2, length (xtilde) ) , 

outer . ok=TRUE) $design 
Omega <- t (Bdd*wts) %*%Bdd 

return (Omega) 

} 

Omega <- formOmega (a, b, intKnots ) 
Obtain the coefficients: 

nuHat <- solve (BTB+lambda*Omega, BTy) 

For large K the following alternative Cholesky-based approach can be considerably faster 
{0{K), because B^B + Xil is banded diagonal): 
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cholFac <- chol (BTB+lambda*Omega) 

nuHat <- backsolve (cholFac, f orwardsolve (t (cholFac) , BTy ) ) 
Display the fit: 

Bg <- bs (xg, knots=intKnots, degree=3, Boundary . knots=c (a, b) , intercept=TRUE) 
fhatg <- Bg%*%nuHat 

plot (x, y , xlim=range (xg) , bty=" 1" , type="n" , xlab=" radiation" , 
ylab="cuberoot of ozone",main=" (a) direct fit; user 
choice of smooth, par.") 

lines (xg, fhatg, lwd=2) 
points (x, y , lwd=2 ) 

Mixed model scatterplot smoothing with REML choice of smoothing parameter 
Obtain the spectral decomposition of il: 

eigOmega <- eigen (Omega) 

Obtain the matrix for linear transformation of B to Z: 

indsZ <- 1 : (numIntKnots+2) 

UZ <- eigOmega$vectors [ , indsZ ] 

LZ <- t (t (UZ) /sqrt (eigOmega$values [indsZ] ) ) 

Perform stability check: 

indsX <- (numIntKnots+3 ) : (numIntKnots+4 ) 
UX <- eigOmega$vectors [ , indsX] 

L <- cbind(UX,LZ) 

stabCheck <- t (crossprod (L, t (crossprod (L, Omega) )) ) 
if (sum (StabCheck" 2) > 1 . 0001* (numIntKnots+2 ) ) 

print ("WARNING: NUMERICAL INSTABILITY ARISING FROM SPECTRAL DECOMPOSITION") 

Form the X and Z matrices: 

X <- cbind (rep (1, length (x) ), x) 
Z B%*%LZ 

Fit using Ime ( ) with REML choice of smoothing parameter: 

library (nlme) 

group <- rep ( 1 , length (x) ) 

gpData <- groupedData (y~x | group, data=data . frame (x, y) ) 
fit <- Ime (y~-l+X, random=pdIdent ( ~-l+Z) , data=gpData) 

Extract coefficients and plot scatterplot smooth over a grid: 

betaHat <- f it$coef $f ixed 

uHat <- unlist (f it$coef $random) 

Zg <- Bg%*%LZ 

fhatgREML <- betaHat [1] + betaHat [ 2 ] *xg + Zg%*%uHat 

plot (x, y, xlim=range (xg) , bty=" 1 " , type="n" , xlab=" radiation" , 

ylab="cuberoot of ozone" , main=" (b) mixed model fit; 

REML choice of smooth, par.") 
lines (xg, fhatgREML, lwd=2 ) 
points (x, y, lwd=2 ) 



14 



(a) direct fit; user 
clioice of smootfi. par. 



(b) mixed model fit; 
REML choice of smooth, par. 




Figure 8: Plots obtained from execution of the first two chunks of code in this Appendix. 
Execution of the above code leads to Figure |8] 
Fitting an additive mixed model 

The spinal bone mineral density data of Bachrach et al. (1999) are not publicly available. 
Therefore we will illustrate fitting of additive mixed models using simulated data. For 
simplicity we will use two ethnicity categories rather than four. 

Generate data: 

set . seed (394600) ; m <- 230 ; nVals <- sample ( 1 : 4 , m, replace=TRUE) 
betaVal <- 0.1 ; sigU <- 0.25 ; sigEps <- 0.05 
f <- function (x) 

return{l + pnorm ( (2 *x-3 6 ) /5 ) /2 ) 
U <- rnorm (m, 0, sigU) 
age <- NULL ; ethnicity <- NULL 
Uvals <- NULL ; idNum <- NULL 
for ( i in 1 : m) 
{ 

idNum <- c (idNum, rep (i, nVals [i] ) ) 

stt <- runif (1, 8, 28- (nVals [i] -1) ) 

age <- c (age, seq ( stt , by=l , length=nVals [ i ] ) ) 

xCurr <- sample (c (0, 1) , 1) 

ethnicity <- c (ethnicity, rep (xCurr, nVals [i] ) ) 
Uvals <- c (Uvals, rep (U [i] , nVals [i] ) ) 

} 

epsVals <- rnorm ( sum (nVals ), , sigEps ) 

SBMD <- f(age) + betaVal*ethnicity + Uvals + epsVals 
Set up basic variables for the spline component: 

a <- 8 ; b <- 2 8; numlntKnots <- 15 

intKnots <- quantile (unique (age) , seq (0, 1, length= 
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(numIntKnots + 2 ) ) [-c (1, (numIntKnots + 2 ) ) ] ) 
Obtain the spline component of the Z matrix. 

B <- bs (age, knots=intKnots, degree=3. 

Boundary . knots=c (a, b) , intercept=TRUE) 
Omega <- formOmega (a, b, intKnots) 
eigOmega <- eigen (Omega) 
indsZ <- 1: (numIntKnots+2 ) 
UZ <- eigOmega$vectors [ , indsZ ] 
LZ <- t (t (UZ) /sqrt (eigOmega$values [indsZ] ) ) 
ZSpline <- B%*%LZ 

Obtain the X matrix: 

X <- cbind (rep (1, length (SBMD) ), age, ethnicity) 

Set up variables required for fitting via Ime ( ) . Note that the random intercept is taken 
care of via the tree identification numbers variable 'idNum', and that explicit formation 
of the random effect contribution to the Z matrix is not required. 

groupVec <- f actor ( rep ( 1 , length ( SBMD )) ) 

ZBlock <- list (list (groupVec=pdIdent ("ZSpline-l) ) ,list ( idNum=pdIdent ( " 1 ) ) ) 
ZBlock <- unlist (ZBlock, recursive=FALSE) 
dataFr <- groupedData (SBMD~ethnicity | groupVec, 

data=data . frame (SBMD, X, ZSpline, idNum) ) 
fit <- Ime (SBMD'-l+X, data=dataFr, random=ZBlock) 
betaHat <- f it$coef $f ixed 
uHat <- unlist (fit$coefSrandom) 
uSplineHat <- uHat [ 1 : ncol ( ZSpline) ] 

Plot the data and fitted curve estimates together: 

ng <- 101 ; ageg <- seq (a, b, length=ng) 

Bg <- bs(ageg, knot s=int Knot s , degree=3 , Boundary . knots=c (a, b) , intercept=TRUE) 
ZgSpline <- Bg%*%LZ 

plotMatrixO <- cbind (rep (1, ng) , ageg, rep (0, ng) , ZgSpline) 
fhatgREML <- plotMatrixO %*% c (betaHat, uSplineHat) 

xLabs <- paste ( "ethnicity =", as . character (ethnicity) ) 

pobj <- xyplot (SBMD~age I xLabs, groups=idNum, xlab="age (years)", 

ylab="spinal bone mineral density" , subscripts=TRUE, 

panel=f unction (x, y, subscripts, groups) 

{ 

panel. grid ; panel . superpose (x, y, subscripts, groups, 

type="b" , col="grey 60 " , pch=l 6) 
panelind <- any (ethnicity [ subscripts ] ==1 ) 
panel . xyplot (ageg, f hatgREML+panelInd*betaHat [ 3 ] , 
lwd=3, type="l", col="black" ) 

}) 

print (pobj ) 

Print approximate 95% confidence intervals for key parameters: 

print (intervals (fit) ) 

Execution of the above code should lead to an outcome similar to Figure |9] (the simulated 
data may differ between platforms). 
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Figure 9: Plot obtained from execution of the last chunk of code in this Appendix. 
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